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ABSTRACT 

The  universal  relation  model  aims  at  achieving  complete  access  path  independence  by  relieving  the 
user  of  the  need  for  logical  navigation  among  relations.  It  assumes  that  for  every  set  of  attributes 
there  is  a  basic  relationship  that  the  user  has  in  mind.  Two  fundamentally  different  approaches  to 
the  universal  relation  model  have  been  taken.  The  first  approach  sees  the  universal  relation  as  a 
user  view,  about  which  he  poses  queries.  Specifically,  a  representative  instance  is  constructed,  and 
queries  are  answered  based  on  its  non-null  part.  The  second  approach  sees  the  model  as  having 
query-processing  capabilities  that  relieve  the  user  of  the  need  to  specify  the  logical  access  path.  The 
relationship  between  the  user’s  view  and  the  computation  answering  a  query  is  a  central  issue  that 
systems  supporting  a  universal  view  of  data  must  handle. 

We  introduce  “lossless”  and  “monotone”  expressions  and  show  that  the  representative  instance 
construction  has  these  properties.  Also,  every  lossless  monotone  expression  produces  a  result  that  is 
a  subset  of  what  the  representative  instance  produces.  We  show  that  the  existence  of  any  first-order 
formula  to  simulate  the  representative  instance  is  equivalent  to  a  “boundedness”  condition  on  the 
dependencies  defining  the  database  scheme.  In  addition,  whenever  there  is  a  first-order  formula  to 
simulate  the  representative  instance,  then  we  can  do  so  with  an  expression  of  simple  form:  the  union 
of  tableau  mappings.  We  close  with  a  discussion  of  some  of  the  problems  with  the  representative 
instance  approach  that  suggest  better  universal  relation  models  may  be  possible. 


L  Underlying  Assumptions 

We  assume  the  reader  is  familiar  with  relational  database  terminology  to  the  extent  covered  in  <Mal,  Ul> 
and  with  the  idea  of  implementing  a  universal  relation  view  of  data  as  discussed  in  those  works  and  <KS, 
MW,  M*,  U2>. 


Goals  of  Universal  Relation  Database  Systems 


A  primary  justification  used  by  Codd  for  the  introduction  of  the  relational  model  was  his  view  that  earlier 
models  were  not  adequate  to  the  task  of  boosting  the  productivity  of  programmers  <C1,  C2>,  One  of  his 
stated  motivations  was  to  free  the  application  programmer  and  the  end  user  from  the  need  to  specify  access 
paths  (the  so-called  “navigation  problem”).  A  second  motivation  was  to  eliminate  the  need  for  program 
modification  to  accommodate  changes  in  the  database  structure,  i.e.,  to  eliminate  access  path  dependence 
of  programs. 

t  Supported  by  NSF  grant  IST-81-04834  and  AFOSR  grant  80-0212  in  accordance  with  NSF  agreement  IST-80-21358.  Some, 
of  this  material  was  developed  while  the  first  author  was  at  SUNY,  Stony  Brook. 

X  Supported  by  AFOSll  grant  8Q  -0212  in  accordance  with  NSF  agreement  IST-80-21358. 
tt  Supported  by  a  Welz  man  n  fellowship;  a  Fulbright  award,  and  NSF  grant  DCS- 80-1 2907. 
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Though  being  a  significant  step  forward,  the  relational  model  by  itself  fails  to  achieve  complete  freedom 
from  user-supplied  navigation  and  from  access  path  dependence.  That  is,  the  relational  model  was  successful 
in  removing  the  need  for  physical  navigation;  no  access  paths  need  to  be  specified  within  the  storage  structure 
of  a  relation.  However,  the  relational  model  has  not  yet  provided  independence  from  logical  navigation,  since 
access  paths  among  the  relations  must  still  be  specified. 

For  example,  consider  a  database  that  has  relations  jE^£)(Employee,  Department)  and  DM  (Department, 
Manager).  If  we  are  interested  in  .  the  relationship  between  employees  and  managers  through  departments, 
then  we  have  to  specify  the  natural  join  of  the  ED  and  DM  relations,  projected  onto  EM,  This  expression 
is  an  access  path  specification,  and  if  the  database  were  reorganized  to  have  a  single  relation  EDM,  then 
the  program  would  have  to  be  modified  accordingly. 

The  universal  relation  model  aims  at  achieving  complete  access  path  independence  by  letting  us 
the  system  in  an  appropriate  language  “give  me  the  relationship  between  employees  and  their  managers,” 
expecting  the  system  to  figure  out  the  correct  access  path  for  itself.f  Of  course,  we  cannot  expect  the  system 
always  to  select  the  correct  relationship  between  employees  and  managers  automatically,  because  the  user 
might  have  something  other  than  the  simplest  connection,  through  departments,  in  mind,  e.g.,  the  manager 
of  the  manager  of  the  employee,  or  the  manager{s)  of  all  departments  that  come  alphabetically  later  than  the 
department  of  the  employee.  We  shall,  in  a  universal  relation  system,  have  to  settle  for  eliminating  the  need 
for  logical  navigation  along  the  most  direct  paths,  while  allowing  the  user  to  navigate  in  more  convoluted 
ways  explicitly. 

Unlike  the  relational  model,  the  universal  relation  model  was  not  introduced  as  a  clearly  defined  model, 
but  rather  evolved  during  the  1970’s  through  the  independent  work  of  several  researchers.  Moreover,  these 
researchers  were  not  only  concerned  with  the  universal  relation  as  a  data  model,  but  some,  like  <B>  and 
<BBG>,  saw  the  concept  primarily  as  a  vehicle  to  discuss  interrelational  data  dependencies.  The  issues 
of  dependency  satisfaction  are  rather  different  from  those  of  supporting  a  user  view,  and  the  multiplicity  of 
assumptions  led  to  considerable  confusion  and  to  some  attacks  on  the  universal  relation  model  (<K,  AP>) 
that  we  regard  as  not  germane  to  the  subject. 

Synopsis  of  Paper 

We  shall  try  in  this  paper  to  unify  and  clarify  the  various  “universal  relation  assumptions,”  as  they  pertain 

to  data  modeling  and  the  support  of  a  user  view.  We  first  indicate  the  assumptions  that  are  so  fundamental 

to  the  universal  relation  model  that  they  are  common  to  all  the  different  approaches  to  the  model.  Then, 

t  A  similar  approach  was  taken  in  the  RENDEZVOUS  system,  which  generates  possible  access  paths  and  lets  the  user  choose 
the  desired  one  <C*>.  . 
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we  consider  what  wc  regard  as  the  two  basic  approaches  to  the  model.  The  first  approach  sees  the  universal 
relation  as  a  user  view,  about  which  he  poses  queries.  Thus,  in  order  to  answer  those  queries  we  must  first 
define  the  semantics  of  this  user  view.  The  second  approach  sees  the  model  as  having  a  query  processing 
ability  that  enables  the  user  to  pose  queries  about  the  actual  database  relations  (rather  than  some  abstract 
universal  relation),  without  specifying  an  access  path.  Thus,  in  order  to  answer  queries,  we  must  first  define 
a  computational  procedure  that  produces  the  desired  answer;  that  procedure  includes  inferring  the  access 
path. 

The  main  technical  contribution  of  this  paper  is  an  exploration  of  the  relationship  between  these  two 
basic  approaches.  We  establish  broad  conditions  under  which  the  approaches  are  the  same,  i.e.,  the  result 
of  the  query  on  the  abstract  universal  relation  is  equivalent  to  a  computation  in  relational  algebra. 

The  Universal  Connection  Assumption 

Perhaps  the  most  basic  assumption  is  that  there  is  a  universaJ  reiation  scbemej  a  set  of  attributes  about 
which  queries  may  be  posed.  Further,  attributes  in  this  set  are  assumed  to  play  only  one  “role,”  and  puns 
are  not  allowed.  Thus,  an  attribute  like  NAME  cannot  stand  for  names  of  employees,  customers,  suppliers, 
and  managers  in  the  same  universal  relation  scheme. 

A  seldom  acknowledged  assumption,  but  one  that  underlies  all  known  universal  relation  systems,  is  that 
query  processing  consists  of  two  steps, 

1.  Binding.  From  the  set  of  attributes  X  mentioned  in  the  query,  form  a  relation  [X],  called  the  connection 
of  X,  over  set  of  attributes  X.  Technically,  [X]  is  a  function  from  database  states  d  to  relations  [X](ci). 
We  shall  use  [X]  to  stand  for  the  relation  [X](d)  when  the  database  state  d  is  understood  or  irrelevant. 
We  allow  the  possibility  that  [X]  is  the  empty  function,  i.e.,  no  connection  over  set  of  attributes  X  is 
permitted. 

2.  Evaluation.  Whatever  operations  must  be  applied  to  answer  the  query  are  then  applied  to  [X]. 

The  binding  and  evaluation  phases  are  independent.  Different  functions  [X]  can  be  used  to  produce  different 
relations  over  X,  without  changing  the  way  evaluation  works  on  the  resulting  relation,  although  the  answer 
may,  of  course,  be  changed. 

For  example,  the  queries 
retrieve  (EMP) 

where  MGR=‘*Jone3” 

and 
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retrieve  (MGR) 

where  EMP=  “Smith” 

are  each  answered  by  forming  from  the  database  some  relation  r  over  (EMP,  MGR)  in  the  binding  phase. 
For  the  evaluation  phase,  in  the  first  case,  we  select  from  r  those  tuples  with  MGR=  “Jones”  and  project 

t 

onto  EMP.  In  the  second  case,  we  select  for  EMP = “Smith”  and  project  onto  MGR. 

Underlying  the  assumptions  that  queries  may  be  answered  in  this  two  step  way  is  the  assumption  that 
for  all  sets  of  attributes  X  (or  at  least  for  many  of  them)  there  is  a  unique  relationship  on  this  set  X 
that  the  user  has  in  mind.  That  does  not  mean  there  can  be  only  one  relationship  on  Xj  but  rather,  one 
relationship  is  the  most  basic  one,  so  we  can  assume  that  this  relationship  is  what  the  user  has  in  mind  unless 
he  explicitly  specifies  otherwise.  In  the  above  example  of  employees,  departments  and  managers,  the  most 
basic  relationship  between  managers  and  employees  is  that  of  “manages,”  while  the  relationship  “manages 
the  manager  of”  we  intuitively  feel  is  less  basic.  This  underlying  assumption  is  called  the  relationship 
uniqueness  assumption.  The  origin  of  the  concept  is  in  the  “window”  concept  of  <Ma2>, 

In  practice,  systems  such  as  <U2,  KS,  MW,  M*>  that  support  a  universal  relation  view  of  data  permit 
queries  with  several  tuple  variables,  each  of  which  may  range  over  a  separate  “copy”  of  the  universal  relation. 
In  that  case,  there  is  one  set  of  attributes  associated  with  each  tuple  variable,  and  for  each  such  set,  X,  we 
allow  the  corresponding  tuple  variable  to  range  over  [X], 

The  One  Flavor  Assumption 

There  is  another  rather  fundamental  assumption  that  underlies  much  of  the  work  on  universal  relation 
systems.  Uniortunately,  this  assumption  seems  impossible  to  derive  from  more  basic  principles.  It  has 
roughly  the  intellectual  status  of  a  belief  like  “entity-relationship  diagrams  are  adequate  to  model  the  real 
world.”  That  is,  there  is  substantial  empirical  evidence  for  the  assumption,  but  we  know  of  no  deep  reason 
why  it  must  be  valid. 

Our  assumption,  called  the  one  flavor  assumption,  is  that  all  tuples  in  [X]  represent  the  same  “flavor” 
of  relationship  among  the  attributes  in  X.  That  is,  the  meaning  to  the  user  of  the  fact  that  tuple  t  is  in  [X] 
does  not  depend  on  the  details  of  the  construction  that  put  it  in  [X].  Another  way  to  state  this  assumption 
is  that  we  must,  when  selecting  the  attributes  in  the  universal  relation  scheme  and  picking  an  algorithm  to 
compute  [X],  arrange  that  the  same  attribute  docs  not  play  two  different  “roles”  in  one  relation  [X]. 

Evidently,  “flavor”  and  “role”  are  defined  only  intuitively.  We  regard  it  as  a  fundamental  hypothesis  of 
universal  relation  systems  that  it  is  alwaysf  possible  to  rename  attributes  so  that  the  singlc-flavoredness  of 
t  or  at  least  sufTicicntly  often  that  it  is  Worth  trying  to  implement  a  universal  relation  support  system. 
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any  relationship  will  be  apparent  to  the  user.  Perhaps  an  example  will  make  these  ideas  clearer. 

Example  Is  Suppose  we  have  a  universal  relation  scheme  ABC  with  relations  ABj  BC^  and  AC.  According 
to  the  functional  dependencies  assumed,  different  systems  such  as  <U2,  KS>  would  make  different  responses 
to  a  query  with  X  =  AB.  The  two  responses  we  are  most  likely  to  get  from  a  universal  relation  system  are 
t)  Apply  the  query  to  the  AB  relation;  that  is,  [AB]  is  simply  the  relation  AB. 
iz)  Take  the  union  of  the  connection  in  AB  and  the  connection  through  C,  that  is, 
[ABj^ABUTTAsiACcxBC) 

It  appears  that  neither  is  right  all  the  time,  and  the  question  of  which  is  right  hinges  on  whether  tuples 
in  AB  are  of  the  same  “flavor”  as  tuples  in  ttabIAC  Cxi  BC).  We  shall  consider  two  interpretations  for  ABC, 
and  try  thereby  to  illustrate  the  distinction. 

Suppose  that  A  =  COURSE,  B  =  STUDENT,  and  C  =  ENROLLMENT.  That  is,  we  may  imagine  that 
a  relation  (COURSE,  STUDENT),  representing  graduate  courses,  was  at  some  time  merged  with  a  network 
database  representing  the  many-many  relationship  between  undergraduate  courses  and  students  by  means 
of  dummy  ENROLLMENT  records,  each  of  which  is  owned  by  a  STUDENT  record  and  a  COURSE  record. 
Then  we  intuitively  feel  that  the  student- course  pairs' obtained  from  the  (STUDENT,  COURSE)  relation 
are  of  the  same  flavor  as  pairs  that  are  related  via  the  (STUDENT,  ENROLLMENT)  and  (ENROLLMENT, 
COURSE)  relations.  That  is,  whichever  the  source,  the  student  is  taking  the  course.  Thus,  in  this  case  we 
would  prefer  interpretation  [ii)  in  response  to  a  query  like 
retrieve  (COURSE) 

where  STUDENT  =  “Jones” 

The  response  according  to  choice  (n*)  would  be  all  courses  taken  by  Jones,  not  just  the  graduate  courses^ 
Now  consider  an  interpretation  of  ABC  where  A  =  STUDENT,  B  =  LEVEL,  and  C  —  COURSE.  The 
(STUDENT,  LEVEL)  relation  gives  the  level  (Freshman,  etc.)  of  each  student,  the  (STUDENT,  COURSE) 
relation  tells  what  courses  the  student  is  taking,  and  the  (COURSE,  LEVEL)  relation  gives  the  nominal  level 
of  each  course. 

Here,  we  believe  that  the  proper  response  to  the  query 
retrieve  (LEVEL) 

where  STUDENT  =  “Jones” 

is  given  by  (i),  that  is,  just  tell  what  level  Jones  is  on,  not  the  set  of  levels  of  Jones  and  all  his  courses. 

Note  that  the  two  queries  above  arc  both  of  the  same  form:  retrieve  [B)  where  A  =  constant.  We  feel 
the  difference  in  proper  interpretation  is  due  to  the  fact  that  (student,  level)  pairs  meaning  the  student  is 
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at  that  level  are  not  of  the  same  flavor  as  pairs  meaning  the  student  is  taking  a  course  at  that  level.  In 
more  conventional  terms,  the  attribute  LEVEL  is  “semantically  overloaded,”  and  should  be  split  into  two 
attributes,  say  STUDENT_LEVEL  and  COURSE_LEVEL.  □ 

The  one  flavor  assumption  is  a  generalization  of  an  assumption  made  by  <B>  to  the  effect  that  attribute 
names  were  split  adequately  so  tha.t  there  would  not  be  two  different  functional  relationships  between  sets 
of  attributes.  That  is,  if  a  functional  dependency  holds  in  a  database  scheme,  and  different  ways  of 

deriving  this  dependency  yield  different  values  yi  and  2/2  for  the  attributes  of  Y  associated  with  value  x  for 
the  attributes  of  X,  then  surely  the  tuples  xyi  and  xy2  over  the  set  of  attributes  XY  cannot  be  regarded  as 
of  the  same  flavor  in  any  reasonable  sense.  However,  there  could  be  one-flavor  violations  that  do  not  involve 
FD’s.  For  example,  the  (COURSE,  LEVEL)  relation  in  Example  1  could  allow  courses  to  be  at  several  levels, 
such  as  Senior/Grad.  The  FD  COURSE— >LEVEL  disappears,  but  the  one-flavor  violation  remains. 

n.  Universal  Relations  as  the  User  View 

The  other  notions  that  have  at  times  been  referred  to  as  “the  universal  relation  assumption”  fall  into  two 
categories.  First  come  assumptions  that  imply  data  is  treated  as  if  it  were  all  in  a  single  relation  over  all 
the  attributes.  Presumably,  [X]  will  be  the  projection  of  this  relation  onto  X.  The  second  group  consists  of 
various  assumptions  about  how  [X]  is  to  be  calculated,  without  explicit  reference  to  a  universal  relation.  The 
relationship  between  the  two  forms  of  definition  is  important  because  the  first  group,  defining  a  user  view,  is 
intellectually  justifiable,  while  the  second  group  provides  the  efficient  computation  needed  to  respond  to  the 
user  in  the  way  he  expects  based  on  his  view.  In  general,  computing  the  view  explicitly  is  far  too  expensive, 
and  we  must  re.sort  to  computing  [X]  only  in  response  to  a  query  about  X. 

We  shall  deal  with  the  first  group  in  this  section.  Historically,  the  first  “assumption”  in  this  class  was 
the  pure  universal  relation  assumption.  By  this  approach,  we  restrict  ourselves  to  cases  where  there  exists 
a  relation  u  over  the  universal  set  of  attributes  17,  such  that,  for  every  relation  scheme  R  in  the  database 
scheme,  the  current  relation  for  R  Is  7ri^(u).  In  this  case  we  take  u  as  the  user  view,  and  [X]  is  simply 

This  approach  has  the  advantage  that  we  can  view  the  database  as  a  physical  representation  of  a  single 
universal  relation,  and  it  facilitates  dealing  with  the  semantic  constraints  on  the  database.  In  fact,  it  was 
exactly  for  that  reason  that  the  pure  universal  relation  was  implicitly  taken  for  the  first  time  by  Bernstein 
<B>,  when  he  developed  a  design  theory  for  relational  databases  with  FD’s. 

Clearly,  in  order  for  this  approach  to  work,  we  have  to  ensure  that  the  universal  relation  is  unique,  and 
that  we  have  an  effective  way  of  computing  it.  These  issues  were  investigated  in  <BR,  MMSU,  Ri,  VI  >. 
However,  even  with  these  issues  solved,  it  was  widely  accepted  that  the  pure  universal  relation  approach  is 
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not  widely  enough  applicable  <BBG>.  In  fact,  in  <IILY>  it  is  shown  that  testing  whether  a  database 
satisfies  the  assumption  or  whether  an  update  maintains  the  assumption  is  ^P-coinplete,  i.e.,  probably 
exponential  in  the  size  of  the  database. 

A  much  more  promising  assumption,  which  has  become  known  as  the  weak  universal  relation  ^sumption, 
is  that  an  appropriate  universal  relation  to  serve  as  a  usee  view  is  any  relation  u  over  [/  such  that 

1.  u  satisfies  whatever  dependencies  are  given,  and 

2.  7r/i(u)  is  a  superset  of  the  current  relation  for  iZ  in  the  database,  for  each  relation  scheme  Jt. 

Since  we  cannot  know  which  of  the  infinity  of  weak  universal  relations  truly  represents  the  “real  world” 
at  any  given  moment,  one  assumes  that  the  only  facts  that  can  be  deduced  about  the  universal  relation  from 
the  given  relations  of  the  database  are  those  that  hold  in  all  weak  universal  relations.  That  is,  we  take  the 
user’s  view  to  be  a  collection  of  sets  of  tuples,  one  for  each  set  of  attributes  X.  Let  Sx  be  the  set  of  tuples 
for  set  of  attributes  X.  In  Sx  appear  exactly  those  tuples  t  such  that  for  every  weak  instance  u,  there  is  a 
tuple  t'  in  u  that  agrees  with  t  on  X. 

Weak  instances  were  first  studied  by  <H>  as  a  means  to  define  satisfaction  of  functional  dependencies 
by  a  collection  of  relations.  They  have  since  been  studied  by  <Me>  as  a  way  to  define  information  content 
in  relational  database  schemes  and  by  <Sal,  Sa2,  Y>  as  a  model  of  what  the  user  should  see  as  the  universal 
relation  about  which  he  is  to  pose  queries. 

An  important  property  of  weak  instances  is  that  as  long  as  the  dependencies  are  of  a  type  for  which  the 
chase  process  (<ABU,  MMS>)  is  a  partial  decision  procedure,  even  dependencies  as  general  as  embedded 
implicational  dependencies  <F,  BV,  YP>,  we  can  construct  from  a  set  of  database  relations  a  single  relation 
that  embodies  the  information  present  in  all  weak  instances.  The  desired  relation,  which  is  the  lelation  Sx 
mentioned  above,  is  constructed  as  follows. 

1.  We  construct  a  relation  u  over  U,  To  begin,  for  each  relation  r  over  one  of  the  database  relation  schemes, 
say  R,  and  for  each  tuple  t  in  r,  place  in  u  a  tuple  that  agrees  with  t  on  the  attributes  of  72,  and  that 
in  the  other  attributes  has  a  new  “null”  symbol  appearing  nowhere  else.  We  use  J_i  for  nulls. 

2.  Apply  dependencies  to  “chase”  u,  that  is,  generate  new  tuples  and  equate  symbols,  as  required  by  the 
dependencies.  However,  when  equating  a  null  and  nonnull  symbol,  replace  the  null  by  the  nonnull.  For 
certain  kinds  of  dependencies,  in  particular  full  dependencies,  where  no  new  symbols  are  generated  by 
the  chase,  this  step  terminates.  However,  we  introduce  new  nulls  when  embedded  dependencies  are 
applied,  so  the  process  may  not  terminate  if  there  are  embedded  dependencies.  In  this  case  the  result 
of  the  present  step  should  be  taken  as  the  infinite  relation  that  results  from  chasing  “forever.” 

3.  If  during  the  chase  process,  we  are  ever  forced  to  equate  two  symbols,  neither  of  which  is  null,  then 
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the  process  stops.  We  interpret  this  situation  as  saying  that  the  actual  relations  in  the  database  do 
not  satisfy  the  given  dependencies.  To  be  exact,  in  this  case  there  is  no  universal  relation  that  satisfies 
the  dependencies  and  produces  supersets  of  the  database  relations  when  projected  onto  the  schemes  of 
those  relations;  that  is,  there  is  no  weak  instance. 

The  relation  constructed  as  above  from  a  list  of  database  relations  ri, . . . ,  r„,  is  called  the  representative 
instance  for  these  relations  (with  respect  to  the  given  dependencies),  and  we  denote  it  by  RI(ri, . . . ,  rn).  The 
weak  universal  relation  assumption  says  that  the  representative  instance  is  a  suitable  model  of  the  data 
as  stored  in  one  relation.  The  representative  instance  differs  from  a  pure  universal  relation  in  that  the 
latter  consists  only  of  total  tuples.  In  contrast,  the  representative  instance  extends  tuples  of  one  relation 
to  have  nonnull  values  in  whatever  components  are  justified  by  the  dependencies.  However,  in  general,  the 
representative  instance  consists  of  relationships  defined  on  subsets  of  the  universal  set  of  attributes — subsets 
that  are  as  large  as  make  sense. 

Whenever  a  tuple  of  u  has  nonnull  symbols  in  the  components  for  set  of  attributes  X,  these  values  are 
presumed  to  be  related  in  a  significant  way,  and  therefore  belong  in  [X].  Put  another  way,  if  we  let  TTinC^) 
stand  for  the  projection  of  u  onto  R  after  throwing  away  all  tuples  that  have  nulls  in  one  or  more  of  the 
components  corresponding  to  the  attributes  in  i2,  then  the  weak  universal  relation  assumption  says  that 
[X]  ==  where  u  is  the  representative  instance  defined  above.  We  call  7r|  the  restricted  projection. 

As  this  way  of  defining  connections  is  only  one  of  many  possibilities,  we  should,  strictly  speaking,  use 
a  special  notation  for  this  definition.  We  shall  use  [X]"^  to  denote  the  set  of  tuples  obtained  by  computing 
the  representative  instance  using  set  of  dependencies  A,  and  then  performing  the  restricted  projection  onto 
X.  A  fundamental  property  of  the  representative  instance  is  that  it  always  produces  the  intersection  of  all 
the  weak  instances,  that  is,  the  sets  Sx  consisting  of  those  tuples  that  are  in  the  projection  of  every  weak 
instance  onto  X.  While  this  relationship  is  generally  believed,  we  provide  a  proof  in  the  following  theorem. 

Theorem  1:  For  all  X,  [X]^  =  ’ 

Proof;  Since  RI(ri, . . . ,  r^)  is  a  weak  universal  relation,  it  is  clear  that  Sx  Q  R  remains  to  prove 

the  other  direction. 

Let  u  be  the  universal  relation  constructed  in  the  beginning  of  the  construction  of  the  representative 
instance.  That  is,  u  is  constructed  by  taking  all  tuples  in  the  database  and  padding  them  with  distinct  nulls. 
Now  let  V  be  any  weak  universal  relation  for  the  database.  That  is,  v  satisfies  the  dependencies  in  A,  and 
7r/?(u)  is  a  superset  of  the  current  relation  for  R  in  the  database  for  each  relation  scheme  R,  It  is  easy  .to 
sec  that  we  can  define  a  mapping  h  on  the  entries  in  u  that  is  the  identity  on  all  entries  that  come  from  the 
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E  \  C  \  D  \  M 


Jones 

Ann 

Xi 

Xa 

Jones 

Jim 

Xs  . 

±4 

Green 

Sue 

Xs 

Xs 

Jones 

Xz 

Shoes 

Xs 

Smith  • 

Xa 

Toys 

Xio 

Xii 

Xl2 

Toys 

Green 

Fig.  1.  Initial  table. 

database  (i.e.,  the  non-null  entries),  such  that  fe(u)  C  v.  Now  we  can  show  by  induction  on  the  chase  steps, 
as  in  <BV,  SU,  MMS>,  that  application  of  the  dependencies  preserves  this  statement.  That  is,  let  u*  be  a 
universal  relation  in  an  intermediate  step  of  the  algorithm.  Then  we  can  define  a  mapping  h  on  the  entries 
in  that  is  the  identity  on  all  entries  that  come  from  the  database,  and  such  that  h{u*)  C  v. 

It  follows  that  we  also  have  such  a  mapping  h  for  which  h(RI(ri, . ,  .,’*n))  Q  v.  Now  let  t  be  a  tuple  in 
RI{ri, . . . ,  fn)  with  non- null  entries  in  the  components  corresponding  to  attributes  in  X.  Then  we  have  that 
k[t)  is  in  V  and  h{t)[X]  =  t[X].  It  follows  that  t[X]  is  in  Sx*  Thus,  [X]^  C  Sx*  n 

Example  2:  Suppose  we  have  attributes  E  (employee),  C  (child),  D  (department),  and  M  (manager),  with 
relation  schemes  EC,  ED,  and  DM  with  current  values 


Suppose  that  the  given  dependencies  are  E-^C,  E~^D,  and  D^M.\  The  initial  value  of  u  constructed  by 
step  (1)  above  is  shown  in  Fig.  1. 

We  could  apply  E--^G  to  deduce  that  the  tuple 
(Jones,  Ann,  JLa,  X^) 

was  in  u.  However,  we  would  later  discover  that  this  tuple  contains  the  same  information  as  the  first  tuple 
in  Fig.  1,  so  we  shall  instead  apply  the  FD’s,  discovering  after  we  do  that  the  MVD  E-^C  la  satisfied  as  a 
result  (which  must  be  the  case  because  the  two  FD*s  logically  imply  the  MVD  here).  Thus,  E-^D  applied 
to  rows  1,  2,  and  4  of  Fig.  1  tells  us  that  Xi  =  Xa  ==  Shoes.  Then  D—^M  tells  us  that  Xio  =  Green, 

t  The  first  of  these  is  redundant  and  is  included  only  to  illiistratc  a  point. 
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E 

G 

D 

M 

Jones 

Ann 

Shoes 

Jones 

•  Jim 

Shoes 

1.2 

Green 

Sue 

±5 

±6 

Jones 

±7 

Shoes 

J_2 

Smith 

_L9 

Toys 

Green 

±11 

1.12 

Toys 

Green 

Fig.  2.  The  representative  instance. 

and  JL2  =  X4  ==  J_8;  we  shall  replace  them  all  by  J__2-  At  this  point  no  further  changes  can  be  made. 
No  nonnull  symbols  were  equated,  so  the  data  is  deemed  to  satisfy  the  given  dependencies.  Figure  2  shows 
the  resulting  representative  instance.  It  serves  as  an  adequate  universal  relation,  telling  us  all  the  facts  that 
can  be  deduced  from  the  given  data  and  the  dependencies.  For  example,  we  know  that  Ann  is  the  child  of 
someone  in  the  Shoe  Department,  because  \GD\  ==  {(Ann,  Shoes),  (Jim,  Shoes)}.  CH 

Representative  Instances  and  Logical  Theories 

Another  way  to  look  at  the  weak  universal  relation  approach  is  to  view  the  database  as  a- logical  theory. 
From  this  point  of  view  (<GaMi,  Ko  ,  Re>)  a  database  is  a  set  of  sentences  in  a  first  order  language  without 
function  symbols,  whose  relation  names  denote  the  relations  of  the  database  and  whose  constants  denote 
the  elements  of  the  domain  of  the  database.  Let  T  be  such  a  theory.  Then,  in  answer  to  a  query  about  a 
relation  R,  we  must  produce  the  set  of  tuples  {t  \  T  R{t)}y  i-e.,  the  set  of  all  tuples  whose  membership 
in  the  relation  for  R  is  implied  by  the  theory.  This  approach  has  the  advantage  of  working  even  in  the  case 
that  the  given  constraints  are  not  dependencies,  so  the  chase  would  not  applicable. 

Let. us  now  see  what  the  theory  T  is  in  our  case.  Our  construction  is  similar  to  the  construction  in 
<GrMe>,  though  they  had  a  somewhat  different  intention  in  mind.  The  language  we  use  has  a  relation 
name  X  for  every  attribute  set  X.  In  particular,  it  has  a  relation  name  R  for  every  relation  scheme  R  in 
the  database  scheme,  and  it  has  the  universal  relation  name  U.  For  constants  we  use  the  elements  of  the 
database,  which  denote  themselves. 

The  theory  has  five  kinds  of  sentences.  First,  we  have  a  set  DB  of  atomic  sentences  describing  the 
relations  in  the  database.  That  is,  for  every  tuple  t  in  some  relation  r  over  relation  scheme  7?,  we  have  in  DB 
the  sentence  R{t).  Secondly,  we  have  a  set  INC  of  sentences  saying  that  the  relation  r  for  a  relation  scheme 
R  is  included  in  the  projection  of  the  universal  relation  on  /?.  For  example,  suppose  that  U  =  ABCD  and 
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K  =  BCD,  Then  put  in  INC  the  sentence 

(V6)(Vc)(Vti)(3a)(i?(6,  c,  d)  t/(a,  6,  c,  d)) 

Thirdly,  we  have  a  set  CON  of  sentences  saying  that  the  relation  x  for  an  attribute  set  X  contains  the 
projection  of  the  universal  reliition  on  X,  In  the  above  example  we  have  the  sentence 
(Va)(V6)(Vc)(Vd){C/(a,  6,  c,  d)  72(6,  c,  d)) 

In  addition  we  have  the  set  DIS  stating  that  all  elements  are  distinct,  i,e,  for  each  pair  of  distinct  elements 
a  and  b  we  have  the  sentence  a  ^  b.  Finally,  we  have  A,  which  is  the  given  set  of  dependencies  written  as 
first-order  sentences  (<BV,  F>)  with  the  universal  relation  name  as  the  only  relation  name.  The  theory  T 
is  taken  to  be  the  union 

DBuINCuCONvDISUii^ 

Consider  now  a  model  of  this  theory.  Since  all  constants  must  be  interpreted  as  distinct  elements  (by 
the  sentences  in  D/5),  we  can  assume  that  they  are  interpreted  as  themselves.  Let  r  be  a  relation  in  the 
database  for  the  relation  scheme  72,  and  let  r'  be  the  interpretation  of  72  in  the  model.  For  every  tuple  t  in  r 
we  have  a  sentence  R{t)  in  DB,  so  t  must  belong  to  r*.  Let  u  be  the  interpretation  of  the  universal  relation 
U,  By  the  sentences  in  INC,  we  have  r'  C  7rij(u),  ahd  consequently,  r  C  7r;^(u).  Also,  u  satisfies  A.  It 
follows  that  u  is  a  weak  universal  relation  for  the  database.  Conversely,  given  a  weak  universal  relation  u 
for  the  database,  we  can  construct  a  model  for  the  theory,  by  taking  u  as  the  interpretation  of  {/,  and  by 
taking  7rx(u)  as  the  interpretation  of  X  for  each  attribute  set  X.  We  have  just  proven: 

Theorem  2:  A  database  satisfies  the  given  dependencies  if  and  only  if  its  theory  is  satisfiable.  D 

Now,  given  an  attribute  set  X,  we  define  [X]r  to  be  {t  |  T  [=  X(t)}.  The  next  theorem  shows  that 
[X]t  is  the  desired  relation.  .  .  * 

Theorem  3?  [Xjr  =  Sx* 

Proof:  We  have  seen  that  every  model  of.T  corresponds  to  a  weak  universal  relation  for  the  database  and 
vice  versa.  Let  i  be  a  tuple  in  Sx*  Then  in  every  weak  universal  relation  u  there  is  a  tuple  t*  such  that 
t  =:  t^[X],  Thus,  in  every  model  of  T  there  is  such  a  tuple  By  the  sentences  in  CON ,  in  every  model  of 
T,  t  is  contained  in  the  interpretation  of  X.  Thus,  T  )=  X(t),  and  i  is  in  [X]t»  The  opposite  direction  is 
similar.  □ 

m.  Computational  Definitions  of  the  Universal  Relation 

Now  we  come  to  a  variety  of  assumptions  that  start  from  the  point  of  view  that  the  user  may  query  about 
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any  set  of  attributes  X,  and  the  system  will  perform  some  computation  on  the  relations  of  the  database 
to  compute  [X].  This  approach  was  originally  taken  by  <B>,  where  it  is  implicitly  assumed  that  [X] 
is  computed  by  joins  simulating  functional  dependencies.  It  was  taken  again  in  <ABU>,  where  it  was 
assumed  that  [X]  should  be  computed  by  a  natural  join.  There,  the  notion  of  a  “correct  join”  was  identified 
with  the  notion  of  lossless  join;  in  fact  the  lossless  join  seems  to  be  the  basic  computational  procedure  in  all 
works  taking  the  computational  approach. 

The  first  implemented  system  to  behave  this  way,  the  UNIX  command  q  <Ah>,  uses  a  list  of  sets  of 
attributes;  the  list  is  established  once  and  for  all,  for  each  database.  [X]  is  computed  by  searching  the  list 
for  the  first  set  of  attributes  Y  that  includes  X,  computing  a  relation  over  Y  in  some  specified  way,  and 
projecting  onto  X.  If  no  set  on  the  list  is  found,  then  the  join  of  all  the  relations  is  taken  and  projected. 
Several  papers  have  investigated  ways  to  find  “correct”  joins,  and  if  possible,  “optimal”  joins  to  compute 
[X]  for  a  given  attribute  set  X.  See  for  example  <Ar,  L,  V2>. 

Most  other  proposals  compute  [X|  by  taking  the  union  of  one  or  more  terms,  each  of  which  is  a  lossless 
join.  (In  practice,  the  sets  of  relations  on  the  list  in  a  q  application  are  likely  to  be  lossless  as  well.)  For 
example,  <0,  Sal,  Sa2,  KS>  discuss  taking  the  union  of  extension  joins  as  a  way  to  compute  [X|..  The 
paper  <MU>  proposes  taking  the  union  of  joins  each  of  which  lives  inside  some  “maximal  object,”  where 
the  join  is  not  only  lossless,  but  where  the  losslessness  follows  from  particular  rules,  like  the  FD  or  MVD 
rules  for  testing  lossless  joins.  Variants  of  this  approach  have  been  implemented  in  System/U  <U2>  and 
PITS<MW,  M*>. 

The  idea  of  using  for  [X]  a  union  of  projections  of  lossless  joins  can  be  generalized  considerably.  .For 
example,  the  System/U  algorithm,  since  it  requires  a  lossless  join  of  “objects,”  which  may  be  proper  subsets 
of  relations,  really  uses  a  union  of  projections  of  lossless  joins  of  projections  of  relations. 

Lossless  Expressions 

Suppose  E{Ri,..,,Rn)  is  any  expression  whose  operands  are  relation  schemes  that  are  subsets 

of  some  universal  scheme  U.  Suppose  also  that  the  result  of  Z?  is  a  relation  over  set  of  attributes  X  C 
We  say,  E  is  lossless  with  respect  to  a  set  of  dependencies  A  if  for  each  relation  u  over  U  that  satisfies  A, 
when  we  substitute  for  each  operand  R  oi  E  the  relation  7r/e(tt),  the  value  of  is  a  subset  of  7rx(«)‘  For 
the  usual  sorts,  of  expressions  we  deal  with,  such  as  joins  and  the  “tableau  mappings”  to  be  defined  formally 
later,  it  is  easy  to  show  that  containment  always  holds  in  the  opposite  direction,  so  “is  a  subset  of”  could  be 
replaced  by  “equals”  in  the  definition  of  losslessness. 

Example  3:  Suppose  we  have  a  universal  relation  scheme  EDOP^  representing  employees,  departments, 
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offices  and  phones.  We  shall  assume  the  FD  but  no  other  dependencies  except  tlie  join  dependency 

tx{ED,EO,EP,OP) 


that  we  suppose  corresponds  to  the  formal  definition  of  the  universal  relation  according  to  the  style  of 
<FMU>.  We  shall,  for  this  example,  take  the  relation  schemes  to  be  EDP,  EO,  and  OP. 

Suppose  we  take  X  =  J50,  that  is,  we  are  interested  m  the  department-office  relationship.  One  lossless 
expression  we  might  use  is  the  join  of  all  the  relations  projected  onto  DO,  that  is 
7Tdo{EDP  X  EO  X  OP) 

The  losslessness  of  this  expression  follows. from  the  given  join  dependency. 

A  simpler  lossless  expression  is  irnoiEDP  X  EO).  The  losslessness  of  this  expression  follows  from  the 
FD  E-^D,  and  it  is  important  to  note  that  EDP  X  EO  is  not  a  lossless  join.  That  is,  to  test  the  losslessness 
of  t^do{EDP  X  EO)  we  must  check  whether  it  is  contained  in  the  expression  t(i}o{EDOP\  using  the  test 
of  <ASU>.  The  tableaux  for  the  two  expressions  are 

E  D  O  P 

do 

c  d  p 

t  o 


and 


E  D  O  P 
d  o 

d  o 


respectively.  We  here  and  throughout  the  paper  use  blanks  for  symbols  that  appear  nowhere  else  in  •the 
given  tableau. 

The  first  of  these  tableaux  can  be  “chased”  using  the  FD  E-^D,  yielding 

E  D  O  P 

do 

e  d  P 

e  d  o 

whereupon  the  containment  TrDo{EDP  X  EO)  C  nDo[EDOP)  follows,  since  the  row  of  the  tableau  for 
'^do{EDOP)  can  map  into  the  second  row  of  the  above  tableau. 

A  third  lossless  expression  we  might  use  is  7too{‘^BD{EDP)X  EO).  We  can  show  this  expression  to  be 
lossless  by  an  argument  siniilar  to*  the  one  used  above.  □ 
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In  fact,  we  shall  later  prove  a  general  rule  for  testing  the  losslessness  of  expressions,  such  as  those  in 
Example  3,  that  are  representable  as  tableaux.  Simply  chase  the  tableau,  and  see  if  a  row  with  all  the 
distinguished  symbols  is  created.  Note  how  this  rule  generalizes  the  lossless  join  test  of  <ABU>,  since 
there  we  want  a  row  with  the  distinguished  symbol  in  every  column.  In  the  present  case,  we  do  not  care 
about  symbols  not  in  the  X- columns,  where  X  is  the  scheme  for  the  result  of  the  expression. 

Monotone  Expressions 

There  is  another  condition  on  expressions  that  plays  a  role  in  characterizing  computational  ways  to  define 
universal  relations.  Say  an  expression  E[Ri^ . . . ,  R^)  is  monotone  if  whenever  C  sj  for  1  <  i  <  n, 
it  follows  that  E(ri,...,rn)  C  E(si, . . . ,  Sn).  For  example,  all  expressions  of  relational  algebra  using  the 
operators  select,  project,  Cartesian  product,  and  union,  i.e.,  all  those  that  do  not  involve  set  difference,  are 
monotone. 

Motivation  for  Losslessness  and  Monotonicity 

There  is  a  natural  motivation  for  restricting  our  attention  to  lossless,  monotone  expressions.  Let  us  first 
consider  losslessness. 

Suppose  that  the  user  actually. has  a  universal  relation  u  in  mind.  Then  he  would  like  to  have  [X]  = 
7rx(ti).  However,  because  of  the  structure  of  the  database,  the  user  cannot  store  it,  and  he  is  forced  instead 
to  store  its  projections  7ri2j(it), . . . ,  onto  the  relation  schemes  of  the  database  scheme.  Thus,  he  would 

like  the  function  /  that  computes  [X]  to  be  such  that  /(7ri?,(it), . . . ,  7r72„(u))  =  7rx(w)-  Well,  perhaps  asking 
for  “the  whole  truth”  is  tco  much,  because  the  database  scheme  may  not  support  the  reconstruction  of 
certain  connections  in  the  universal  relation.  But  surely  the  user  would  like  “nothing  but  the  truth”;  that 
is,  /{7ri^j(ii), . . . ,  C  7rx(u).  In  other  words,  /  should  be  lossless. 

Indeed,  the  function  /  defined  by  the  representative  instance  is  lossless.  To  see  informally  why  this  is 
so,  suppose  we  start  with  a  universal  relation  ui  satisfying  a  set  of  dependencies  A,  and  we  project  ui  onto 
some  relation  schemes  to  get  relations  ri, .  ..,rfc.  Whatever  the  chase  process  for  the  dependencies  in  A  is, 
we  expect  that  no  combination  of  nonnull  symbols  will  be  generated  by  the  chase  unless  it  is  a  consequence 
of  A.  Since  Ui  has  all  the  combinations  of  symbols  found  among  the  tuples  in  the  r^-’s  and  satisfies  A,  every 
combination  found  in  the  result,  U2j  of  the  chase,  will  be  found  in  ui.  Thus 

which  proves  that  the  representative  instance  construction  is  lossless. 

Note  that  we  cannot  in  general  prove  that  equality  holds,  since  it i  may  contain  tuples  in  its  projection 
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onto  X  that  cannot  be  reconstructed  by  the  chase.  However,  there  is  a  broad  class  of  dependencies  for  which 
we  can  almost  prove  equality.  An  implicational  dependency  is  said  to  be  typed  if  symbols  do  not  appear 
in  more  than  one  column.  That  is,  the  domains  for  the  various  attributes  are  regarded  as  disjoint,  and 
dependencies  can  neither  be  predicated  on  the  same  value  appearing  in  more  than  one  column,  nor  can  they 
infer  the  presence  in  one  column  of  a  symbol  appearing  in  dnother. 

Theorem  4:  Let  A  be  a  set  of  typed  implicational  dependencies,  ui  a  universal  relation  satisfying  A,  and 
«2  the  result  of  projecting  ui  onto  relation  schemes  . . . , /2n  and  constructing  the  representative  instance 
from  these  projections.  Then  Trix{u2)  is  cither  empty  or  is  exactly  7rx(«i)‘  The  latter  occurs  exactly  when 
the  expression  7rx(  ^  Is  lossless  with  respect  to  A, 

Proof;  Wc  argued  above  why  7rix(«2)  Q  Conversely,  suppose  7rix(«2)  Is  not  empty.  Then,  A 

allows  us  to  infer,  from  the  fact  that  certain  tuples  are  in  the  relations  over  the  that  some  tuple  t  with 
non- null  components  for  X  exists  in  the  representative  instance  U2. 

Now  consider  any  tuple  s  in  wi,  and  consider  its  projection  onto  the  relation  schemes.  Since  the 
dependencies  in  A  are  typed,  all  equalities  that  A  requires  to  infer  t  are  satisfied  by  the  projections  of  s, 
since  all  these  projections  agree  in  components  that  they  have  in  common.  Therefore,  A  will  imply  the 
existence  in  U2  of  some  tuple  with  non-null  components  for  X,  and  these  components  must  agree  with  the 
corresponding  components  of  a,  since  A  is  typed.  Thus,  afX]  is  in  7rix(^^2)*  Also,  taking  the  projections  of  a 
tuple  on  Riy . . .  ,i2nj  padding  them  with  nulls,  and  chasing  them  with  A,  is  exactly  the  test  for  losslessness 
of  the  expression  7rx(  ^  Ri)*  Since  wc  showed  that  we  get  a  tuple  with  non-null  components  for  X,  it 

l<»<n 

follows  that  the  expression  is  lossless. 

Finally,  if  the  expression  is  lossless,  then  the  losslessness  test  will  produce  a  tuple  with  non-pull 
components  for  X,  and  that  means  that,  starting  with  a  tuple  s  from  ui,  we  get  7rj<-(s)  in  Cl 

Corollary  1:  Let  A  be  a  set  of  implicational  dependencies,  tii  a  universal  relation  satisfying  A,  and  t*2 
the  result  of  projecting  ux  onto  relation  schemes  and  constructing  the  representative  instance 

from  these  projections.  Then  Ttix{^2)  =  if  and  only  if  the  expression  7rx(  ^  i?»)  is  lossless  with 

respect  to  A.  ' 

Proof;  The  proof  is  similar  to  the  proof  of  the  theorem.  D 

When  the  dependencies  are  not  typed,  then  Theorem  4  does  not  necessarily  hold,  as  the  next  example 
shows. 

Example  4:  Let  the  universal  relation  scheme  be  ABCD  and  the  relation  schemes  be  Ai?,  AC,  and  AD. 

15 


Suppose  we  have  the  following  implication al  dependency. 

A  B  C  D 

abed 
a!  a  d  d! 

That  is,  whatever  symbols  appear  together  in  the  A  and  D  components  must  also  appear  together  in  the  B 
and  C  components. 

Then  consider  the  following  universal  relation  u. 

a\  bx  Cx  dx 

ax  ax  dx  dx 

Let  X  =  BCf  so  the  projection  of  u  onto  X  is  {bicx^axdx  }.  If  we  project  u  onto  AB,  ACj  and  AD  and 
chase  the  representative  instance,  we  can  infer  the  existence  of  a  tuple  with  middle  components  aidi,  but 
not  the  existence  of  one  with  middle  components  6iCi,  Cl 

The  motivation  for  monotonicity  comes  from  our  hope  to  duplicate  by  an  expression  the  connection  [X] 
defined  by  the  representative  instance,  and  from  the  fact  that  the  function  defined  by  the  representative 
instance  is  mono  tone,  f  It  is  straightforward  to  show,  whenever  the  representative  instance  is  defined  for  the 
two  databases  (^*1,...,^^)  » •  •  >  he.,  both  databases  satisfy  the  given  dependencies  A,  that  the 

condition  r,-  C  a,-  for  all  i  implies  [X]^(ri, . . . ,  r/t)  C  [X]^(ai, . . . ,  a^).  Thus  [X]  defined  by  the  restricted 
projection  of  the  representative  instance  has  the  monotonicity  property. 

IV.  Representative  Instances,  Lossless-Monotone  Expressions,  and  Tableau  Mappings 

The  broadest  observation  we  can  make  is  that  the  lossless,  monotone  expression  approach  to  defining  [X|  can 
only  produce  tuples  that  we  get  from  the  representative  instance.  In  this  section  we  also  introduce  tableau 
expressions  and  explore  their  relationship  to  the  representative  instance. 

Containment  of  Lossless-Monotone  Expressions  within  Representative  Instances 

Theorem  5:  Let  E  be  an  expression  that  is  monotone,  lossless  with  respect  to  some  set  of  dependencies  A, 

and  produces  a  relation  over  X.  We  do  not  constrain  A,  except  that  it  must  consist  only  of  dependencies  for 

which  the  “chase”  process  succeeds  in  producing  a  (finite  or  infinite)  representative  instance  that  satisfies  A, 

e.g.,  the  implicational  dependencies.  Then  if  tuple  t  is  in  E[rx) .  follows  that  t  is  in  7rix(u),  where 

t  Another  surh  property,  which  docs  not  play  a  role  here,  is  the  containment  condition,  which  says  that  if  X  and  Y  are  two 
sets  of  attributes,  and  X  QY^  then  for  all  database  states  d,  7rx([l^](d))  C  [A'Kd).  That  is,  whatever  connection  among  the 
attributes  of  Y  is  represented  by  the  database,  the  connection  for  X  is  an  essential  part  of  it. 
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ti  —  •  •  •  >  ^n)* 

Proof:  Let  Ri  be  the  relation  scheme  for  r<,  and  let  at  =  Then  r*  C  a,*,  since  u  is  RI(ri, . . .  ,rn). 

By  monotonicity,  i  is  in  E{sx, , . ,  ^Sn).  As  J?  is  lossless  with  respect  to  A,  and  u  satisfies  A,  t  must  be  in 
7rx(«),  and  since  t  has  no  null  symbols,  it  is  in  7rix(w).  Q 

Tableau  Mappings 

We  wish  to  deal  with  the  question  of  when  there  is  an  expression,  particularly  a  first-order  formula  (i.e., 
an  expression  of  relational  algebra),  to  simulate  the  effect  of  the  representative  instance.!  To  make  this 
characterization  we  need  to  introduce  two  concepts.  The  first  is  tableau  mappings,  which  are  expressions 
that  can  be  denoted  by  tableaux  as  in  <ASU>,  and  the  second,  which  we  call  “bounded”  database  schemes, 
involves  a  strong  limit  on  the  length  of  the  chase  needed  to  deduce  that  a  particular  tuple  is  in  [X]  during 
the  construction  of  the  representative  instance. 

For  our  purposes,  both  tableaux  and  (embedded)  implicational  dependencies  will  be  represented  in  the 
same  notation,  (ti, . . . ,  tjk)/i,  where  the  t^-’s  and  t  are  rows  of  abstract  symbols.  The  components  of  these  rows 
correspond  in  a  fixed,  understood  order,  to  the  attributes  of  the  universal  relation  scheme.  The  positions  of 
t  are  either  blank  or  are  symbols  that  appear  at  least  once  among  the  t,’’s.  If  a  dependency  is  represented, 
t  could  alternatively  be  an  equality  a  =  6  between  two  symbols  appearing  among  the  t,*s.  If  a  tableau  is 
represented,  then  the  t,*’s  could  be  tagged  by  relation  schemes  or  relation  names;  we  write  ti  {Ri)  to  indicate 
that  row  U  is  tagged  by  relation  Ri,  In  that  case,  we  expect  that  every  position  of  ti  that  does  not  correspond 
to  an  attribute  in  the  relation  scheme  for  Ri  will  have  a  unique  symbol,  one  that  appears  nowhere  else  in 
the  tableau.  Normally,  we  represent  unique  symbols  by  blanks. 

As  dependencies,  we  call  the  ti^s  hypothesis  rows  and  t  the  conclusion  row.  The  notation  being  used 
here,  and  the  meaning  attributed  to  these  dependencies  is  defined  further  in  <SU,  Ul>,  and  in  <BV,  F>, 
although  different  notation  is  used  in  the  latter  papers.  Roughly,  the  dependency  says  that  whenever  we 
see  tuples  that  look  like  the  hypothesis  rows,  in  the  sense  that  there  is  a  mapping  of  symbols  that  makes 
all  the  hypothesis  rows  become  tuples  of  the  relation,  then  there  is  also  some  tuple  in  the  relation  that  is 
the  conclusion  row  after  mapping  of  the  symbols,  with  blank  positions  mapped  to  arbitrary  symbols;  if  the 
conclusion  is  a  =  6,  then  instead  we  require  that  this  arbitrary  symbol  mapping  has  in  fact  mapped  a  and 
6  to  the  same  symbol,  because  the  relation  allowed  no  other  possibility. 

As  an  expression,  the  are  called  rows,  and  t  is  called  the  summary.  The  meaning  of  such  a  mapping 

t  Note  that  there  is  always  a  second-order  formula  to  simulate  the  representative  instance,  since  we  may  thus  express  the 
condition  that  a  relation  with  the  properties  of  the  representative  instance  exists. 
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e  ,  o  {EO) 

o  V  {OP) 

do 

Fig.  3.  A  tableau  expression  or  implicational  dependency, 
is  described  in  <ASU>..  Informally,  it  means  that  the  result  of  the  mapping  applied  to  a  universal  relation 
u  is  found  by  taking  each  possible  mapping  of  symbols  that  makes  each  row  a  tuple  in  u  and  placing  in 
the  result  the  tuple  that  is  formed  by  applying  this  symbol  mapping  to  the  summary.  We  can  also  apply  a 
tableau  mapping  not  to  the  universal  relation  but  to  a  collection  of  relations  ri, . . . ,  rn  over  relation  schemes 
^1,. .  .,i?n  that  are  each  subsets  of  the  universal  relation  scheme.  In  this  case  we  require  that  for  each  row 
ty,  tagged  by  some  relation  72,  we  have  72  ==  Ri  for  some  i,  1  <  t  <  n,  and  that  the  symbol  mapping  sends 
tj,  restricted  to  72^,  to  some  tuple  of  r^.  • 

Lemma  1:  Let  E  =  (ti, . .  *,tn)lt  be  a  tableau  expression,  and  A  a  set  of  implicational  dependencies.  Then 
E  is  lossless  with  respect  to  A  if  and  only  if  A  when  E  is  treated  as  an  implicational  dependency. 

Proofs  E  is  lossless  if  and  only  if  it  is  contained  in  the  expression  that  is  the  projection  onto  those  attributes 
in  which  t  has  a  nonblank,  that  is,  in  the  result  of  the  tableau  mapping  t'/i,  where  is  t  with  blanks  replaced 
by  new  symbols.  By  the  test  of  <ASU>,  this  containment  holds  only  if,  after  chasing  {^i, . . .  }  by  the 

dependencies  in  A,  can  map  to  one  of  the  resulting  rows.  But  that  is  exactly  the  condition  under  which 
A  1=  F.  □ 

Example  5:  In  Fig.  3  we  see  an  expression  or  dependency  based  on  the  database  of  Example  3.  We  follow 
the  convention  of  using  blanks  not  only  in  the  summary /conclusion,  but  everywhere  that  a  symbol  appearing 
only  once  is  found.  As  a  tagged  tableau  mapping,  it  produces  the  natural  join  of  the  three  relations  EDP, 
EOy  and  OP,  projected  onto  DO,  As  a  dependency,  it  says  that  d  and  o  appear  together  in  a  tuple  of  each 
universal  relation  in  which  for  some  e  and  p,  there  is  a  tuple  in  which  e,  d,  and  p  appear  togetlier,  another 
tuple  in  which  e  and  o  appear,  and  a  third  in  which  o  and  p  appear,  all  in  their  appropriate  columns.  As  a 
consequence  of  this  dependency,  the  above  expression  7r£>o(^  {EDP,EO,OP))  is  lossless.  D 

Bounded  Database  Schemes 

A  database  scheme  is  a  finite  set  of  relation  schemes  and  a  finite  set  of  dependencies  that  apply  to  the 
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universal  relation  scheme  that  is  the  union  of  the  given  relation  schemes.  We  denote  the  database  scheme 
by  /?  =  (A,  { , /?n  })i  where  A  is  the  dependencies  and  Ri,»**,Rn  the  relation  schemes.  We  say  that 
two  database  schemes  are  equivalent  if 

1.  their  relation  schemes  are  the  same,  and 

2,  their  dependencies  are  logically  equivalent,  i.e.,  the  same  universal  relations  satisfy  both  sets  of  depen¬ 
dencies. 

Let  D  =  (A,  {J?i, iZn  })  be  a  database  scheme.  We  say  D  is  k-bounded  (for  set  of  attributes  X) 
if  for  any  relations  ri,,.,,rn  over  the  /?,’s,  if  u  =  RI(ri, . . . ,  fn),  and  t  is  in  then  we  can  deduce 

that  fact  by  a  sequence  of  at  most  k  applications  of  dependencies  in  A,  starting  with  the  r^-^s  (padded  with 
blanks  as  in  Fig.  1).  D  is  bounded  if  it  is  fc-bounded  for  some  fc.  Note  that  “bounded”  says  more  than  that 
the  chase  terminates  in  a  finite  relation  for  any  r^’s.  It  says  that  sequences  of  k  dependency  applications 
suffice  independently  of  the  initial  r^’s.  As  we  shall  see,  “bpunded,”  “fc-bounded,”  and  “1-bounded”  all  are 
equivalent  statements,  in  the  sense  that  any  boimded  database  scheme  is  equivalent  to  a  1-bounded  scheme. 

V*  Lossless  Tableau  Mappings  and  Representative  Instances 

We  shall  now  develop  our  characterization  of  when  the  representative  instance  can  be  simulated  by  a  first- 
order  formula.  In  particular,  we  show  the  equivalence  of  the  following  three  statements  about  a  database 
scheme  P,  whose  dependencies  are  im plica tional,  and  a  set  of  attributes  X. 

1.  D  is  bounded  for  X  (and  in  fact  1-bounded), 

2.  There  is  a  first-order  formula  that  computes  7ri;ir(IlI(?*i,  •  •  •  i  T-n)),  i.e.,  there  is  an  expression  of  relational 
algebra  that  simulates  the  representative  instance. 

3.  7rix(RI(^i>  •  •  •  I  ^n))  is  computed  by  a  finite  union  of  tableau  mappings. 

Theorem  6:  Let  D  =  (A,  {  /Zj, . . . ,  Rn  })  be  a  database  scheme,  where  A  is  a  set  of  implicational  depen¬ 
dencies.  Then  there  is,  for  each  set  of  attributes  X,  a  finite  set  of  lossless  tableau  expressions  whose  union 
yields  the  same  relation  as  7rix(«)>  where  u  is  RI(ri, ...,  rn)  if  and  only  if  D  is  equivalent  to  some  bounded 
scheme. 

Proof: 

If;  We  may  as  well  assume  that  D  itself  is  bounded.  Let  k  be  the  bound  on  the  number  of  dependencies 
that  need  to  be  applied,  and  let  m  be  the  maximum  number  of  hypothesis  rows  in  any  member  of  A.  Then 
consider  every  tagged  tableau  expression  E  whose  summary  has  nonblanks  exactly  in  the  positions  for  the 
attributes  in  X,  and  that  has  at  most  km  rows.  Depending  on  the  equalities  of  various  symbols  among  the 
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Fig.  4.  Chase  of  unbounded  length. 

rows  of  the  tableau,  E  may  or  may  not  be  guaranteed  to  produce  tuples  over  X  that  will  be  produced  when 
the  representative  instance  is  chased  by  A.  If  A  |=  F,  then  we  have  such  a  guarantee;  if  not  then  we  don’t. 

Moreover,  since  D  is  bounded,  all  tuples  found  to  be  in  the  representative  instance  projected  onto  X 
will  be  generated  by  some  such  expression  that  is  logically  implied  by  A  when  the  expressiofl  is  treated  as  a 
dependency.  Also,  there  is  evidently  only  a  finite  number  of  such  dependencies.  We  therefore  can  compute 
7rix{^)  hy  taking  the  union  of  all  those  expressions  whose  tableaux’  summaries  have  distinguished  symbols 
in  exactly  the  columns  of  X,  and  that  are  logically  implied  by  A,  when  treated  as  dependencies. 

Only  if:  Suppose  that  there  is  a  lossless  expression  Ex  that  is  the  finite  union  of  tableau  expressions  and 
produces  the  same  result  as  7rix(^)-  Convert  the  set  of  Ex^&  to  a  set  of  dependencies,  say  A'.  Since  the 
Ex'^  are  lossless,  A  [=  A'.  It  follows  that  A  is  equivalent  to  A  U  A'. 

Clearly,  one  step  of  dependency  application  using  A'  serves  to  obtain  enough  tuples  in  u  to  prove  that 
will  contain  whatever  Ex  produces,  as  P  is  1- bounded.  D 

Example  6;  Consider  the  database  scheme  D  =  ({ A— ►i?,  (7— >JB  },  {ADfAC})»  We  claim  that  D  is  not 
bounded,  and  therefore  the  universal  relation  defined  by  its  representative  instance  cannot  be  simulated  by 
any  finite  union  of  lossless  tableau  expressions.  To  see  informally  why  D  is  not  bounded,  consider  the  relations 
for  and  { aiCi,a2Ci,a2C2, 0302, .  ..  .,anCn  }  for  AC,  The  initial  representative 

instance  is  shown  in  Fig.  4(a),  and  the  chased  version  in  Fig.  4(b).  Note  that  deleting  any  tuple  from  AG 
means  that  a„6iCn  will  no  longer  appear  in  the  chase. 
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Suppose  now  that  D  is  bounded.  Whatever  set  of  dependencies  equivalent  to  }  we  choose, 

the  existence  of  a  tuple  a^biCn  will  have  to  be  inferred  from  only  a  finite  number  of  the  tuples  in  AC* 
However,  by  making  n  large  enough,  we  can  force  the  inference  to  be  made  without  looking  at  some  aiCi* 
Then,  by  deleting  this  tuple*  from  yl(7,  we  can  force  a  situation  where  the  dependencies  {A-^ByC-^B} 
no  longer  imply  the  presence  of  tuple  anbxCn,  yet  the  chase  using  the  equivalent  set  of  dependencies  still 
produces  that  tuple.  D 

Now  let  us  turn  our  attention  to  the  second  equivalence,  that  between  arbitrary  first-order  formulas  and 
finite  unions  of  lossless  tableau  mappings.  .  . 

Theorem  7:  Let  D  =  (A,  {i2i, .. .  ,i2n  })  he  a  database  scheme,  where  A  is  a  set  of  implicational  depen¬ 
dencies,  and  let  X  be  a  set  of  attributes.  Then  ^ ix(Rl(»*if  •••  i^'n))  is  expressed  by  a  first-order  formula  if 
and  only  if  it  is  expressed  by  some  finite  union  of  lossless  tableau  mappings. 

Proof:  The  “if”  portion  is  trivial.  For  the  “only  if”  part,  we  observe  that  every  time  we  add  a  tuple  in  the 
chase,  we  can  find  a  tableau  mapping  that  yields  the  projection  of  this  tuple  onto  X.  We  find  this  tableau  by 
first  expressing  as  an  implicational  dependency  the  fact  that  this  tuple  is  inferred  from  a  finite  set  of  tuples 
pf  the  original  relations.  The  hypotheses  of  this  dependency  will  each  have  a  particular  relation  scheme  in 
v/hose  columns  all  the  nonunique  symbols  appear;  thus  it  can  be  viewed  as  a  tagged  tableau  mapping.  By 
Lemma  1,  the  losslessness  of  this  mapping  follows  from  A.  It  follows  that  there  is  some  (possibly  infinite) 
set  of  lossless  tableau  mappings  whose  union  yields  •  •  • »  rn)). 

Let  these  mappings  be  and  let  —  be  the  relations  over  X  produced  by  these 

mappings.!  We  know  that  U  Q2  U  •  •  •  is  equal  to  7rix(R'I(’’i>  •  •  There  is  an  unexpected  difficulty 

here  because  we  cannot  refer  to  this  infinite  union  by  a  first-order  sentence.  However,  we  can  say  that  the 
relation  is  a  superset  of  this  union  by  using  an  infinite  set  of  sentences.  In  what  follows,  we  use  Ri  as 
a  relation  symbol  that  stands  for  the  relation  in  the  database  scheme,  and  we  use  J2  as  an  arbitrary 
relation  over  X.  •  .  . 

By  <GV>,  we  can  find  a  (possibly  infinite)  set  of  implicational  dependencies  A'  over 

asserting  that  these  relations  are  consistent,  in  the  sense  that  when  we  chase  these  relations,  we  do  hot 

attempt  to  equate  two  different  nonnull  symbols,  and  therefore,  the  representative  instance  exists.  Further, 

we  can  construct  for  each  Ti  a  first-order  formula  **  ,Rn)  that  asserts  that  tuple  f  is  in  Qi*  Let 

. . . ,  i?n)  he  the  hypothetical  first-order  formula  that  says  of  tuple  t  that  t  is  in  7rlx{w)i  where  u  is  the 

representative  instance  constructed  from  the  relations  for  which  the  Ri^s  stand.  Then  we  have  the  following 

t  The  reader  should  be  aware  that  in  this  proof,  we  represent  relations  by  predicate  symbols,  Q*s  and  72*s.  Thus,  Q  really 
stands  for  the  predicate  Q(t)  that  is  true  if  and  only  if  tuple  t  is  in  the  relation  we  call  Q.  Wc  shall  continue  to  treat  such 
predicate  symbols  as  if  they  were  relations,  e.g.,  by  applying  algebraic  operators  to  them. 


logical  implication. 


A'  U  {  1 1  =  1, 2, . . . }  H  {yt)i4>^R{t)) 

The  reason  that  this  implication  holds  is  as/ollows.  In  a  model  of  the  left-hand  side,  the  relations  ri,..  .,rn 
for  Ri,..,,Rn  constitute  a  database  that  has  a  weak  universal  relation,  and  the  relation  for  R  contains  all 
the  Qx^s,  Thus  it  contains  and  therefore  it  contains  the  relations  produced  by  Thus 

the  model  satisfies  Wt{<f>  /2(i)). 

By  compactness,  there  is  a  finite  subset  of  the  '0i’s,f  which  we  may  take  to  be  'tpx, . . . ,  '0*,  such  that 

A'  u  { m^i^R{t)), ....  }  h 

Now  let  J?  be  Qi  U  •••  U  Qkt  that  is,  the  union  of  the  results  of  applying  the  k  tableau  mappings  to  the 
given  relations.  Thus  R{t)  is  logically  equivalent  to  Qi(i)  V  **•  V  Qk{t)^  Then  i2i, . . . , 
surely  true  for  1  ^  ^  fc,  so 

A'  H  (Vi)(0(i, Rn)--{Qi{t)  V  •  •  •  V  Qkim 

That  is,  7rix(w)  Q  {Qi  U  •  •  •  U  Qjt),  where  u  is,  as  before,  the  representative  instance  constructed  from  the 
relations  of  the  database. 

Containment  in  the  opposite  direction  is  obvious,  since  the  were  constructed  to  mimic  what  the 
chase  does,  and  each  Qi  is  the  result  of  applying  T,*  to  the  relations  of  the  database.  Thus  fpr  all  database 
relations  ri,...,rn  that  satisfy  A'  (i.e.,  we  can  successfully  chase  the  r^^s  to  construct  a  representative 
instance  u)  we  have  that  7rix(w)  equals  the  union  of  the  tableau  mappings  Ti, . . . ,  Tjt  applied  to  ri, . . .  ,rn. 
□ 

Query  Optimization, 

Our  main  interest  in  the  computational  approach  to  the  universal  relation  model  comes  from  a  practical 
consideration  of  computational  efficiency;  we  do  not  want  the  expressions  computing  the  [X]’s  to  be  too 
complicated.  Thus,  naturally,  the  issue  of  optimizing  the  expression  to  compute  [X]  is  of  paramount  interest. 
For  example,  Sagiv  <Sal,  Sa2>  takes  only  minimal  extension  joins  to  produce  the  answers  to  queries,  and 
in  limited  cases,  proves  that  these  simple  expressions  suffice  to  compute  [X]^. 

An  interesting  consequence  of  Theorem  7  is  that  we  can  use  the  weak  optimization  technique  of  <ASU> 
to  optimize  our  expression.  By  that  theorem,  we  have  to  deal  only  with  unions  of  lossless  tableau  mappings. 
Let  T  be  such  a  tableau.  We  can  view  T  as  a  tagged  tableau  that  defines  a  mapping  on  relations  over 
as  an  untagged  tableau  that  defines  a  mapping  on  universal  relations,  or  as  a  dependency  on 


t  and  incidentally,  a  finite  subset  of  A\' 


universal  relations.  Suppose  that  T'  is  another  tagged  tableau,  obtained  by  removing  some  of  the  rows  of 
T,  that  is  weakly  equivalent  to  T.  That  is,  T*  is  equivalent  to  T  when  T  and  T'  are  considered  as  untagged 
tableaux.  Clearly,  T  is  contained  in  T'  when  they  are  considered  as  tagged  tableaux;  i.e.,  when  applied  to 
relations  over  Ri^ , . j2n>  T'  produces  all  tuples  that  T  produces.  Now,  T  is  lossless  with  respect  to  the  set 
of  dependencies  A,  so  A  |=  T,  when  T  is  considered  as  a  dependency,  and  consequently,  A  [=  T',  when  T' 
is  considered  as  a  dependency.  Thus,  as  a  tagged  tableau  produces  only  tuples  that  are  produced  by  the 
representative  instance.  As  a  consequence,  can  replace  T  in  the  union  of  tableau  mappings.. 

Weakly  optimized  joins  on  maximal  objects  are  used  in  System/U  in  order  to  compute  connections 
<U2>.  The  motivation  there  is  given  by  appealing  to  the  way  dangling  tuples  are  treated;  this  argument 
is  intuitively  reasonable  but  without  mathematical  foundations. 

Storage  of  Query  Interpretation  Information 

Naturally,  we  do  not  wish  to  store,  for  the  current  database  state  d,  all  the  views  [X](d),  where  X  ranges 
over  all  sets  of  attributes.  However,  might  it  be  feasible  to  store  expressions  for  calculating  [X]{d)  from  d  for 
all  X?  The  implication  of  our  developments  is  that  if  any  first-order  way  of  computing  connections  exists, 
then  we  can  establish  for  each  set  of  attributes  X  a  finite  set  of  tableaux  whose  mappings  together  produce 
[X].  However,  if  there  are,  say,  100  attributes  in  the  universal  scheme,  it  does  not  seem  realistic  to  store  all 
the  expressions  needed  to  reconstruct  the  [X|*8. 

Existing  universal  relation  systems  have  mechanisms  for  constructing  the  expressions  for  [X]  “on  the . 
fly,”  For  example,  System/U  <U2>  stores  only  the  maximal  objects,  and  obtains  [X]  by  reductions  of  the 
expressions  for  the  maximal  objects. 

It  appears  that  we  can  take  something  like  this  approach  in  general.  If  connections  a,re  defined  ‘by 
the  representative  instance,  then  for  any  set  of  attributes  X  and  attribute  A  not  in  X,  [X]  C  7rx([XA]). 
Thus,  [X]  is  at  least  the  union  of  the  projections  of  all  [XA]*s.  If  [X]  is  exactly  equal  to  the  union  of  these 
projections,  then  we  need  not  store  an  expression  for  [X].  The  only  X’s  for  which  we  need  to  store  a  formula 
are  those  for  which  [X]  is  a  proper  superset  of  \Ja  7rx([XA]);  these  were  called  implicit  objects  in  <Mal>, 
because  they  generalized  the  idea  of  constructing  maximal  objects  as  in  <MU>. 

yi.  Concluding  Remarks 

We  have  explored  computational  methods  that  might  be  used  to  simulate  the  effect  of  the  representative 
instance.  Three  overlapping  classes  of  expressions  were  considered  as  possible  computation  methods: 

1.  monotone,  .  ' 
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2,  lossless,  and 

3.  first-order. 

We  also  identified  a  class  of  expressions  that  is  in  the  intersection  of  all  three  of  these  classes;  unions  of 
lossless  tableau  mappings. 

We  showed  that  monotonicity  and  losslessness  are  properties  that  we  should  expect  of  any  computational 
method  that  simulates  the  representative  instance,  for  the  simple  reason  that  the  representative  instance  has 
these  properties.  If  we  want  a  first-order  method,  i.e.,  an  expression  of  relational  algebra,  then  we  find,  that 
we  need  only  consider  finite  unions  of  lossless  tableau  mappings,  since  all  first-order  methods  are  equivalent 
to  one  of  these.  As  a  consequence,  condition  (3)  above  implies  (1)  and  (2). 

We  would  like  to  close  by  pointing  out  three  shortcomings  of  the  theory.  First,  though  we  have  identified 
the  class  of  database  schemes  where  the  representative  instance  can  be  simulated  by  a  first-order  expression, 
our  characterization,  the  boundedness  condition,  is  not  effective;  we  do  not  know  how  to  test  whether  a 
schema  is  bounded  or  not..  In  fact,  we  do  not  even  know  whether  this  problem  is  solvable  at  all,  even  in 
simple  cases  where  only  functional  dependencies  are  given. 

Secondly,  in  showing  that  if  there  is  any  first-order  expression  then  it  must  be  a  union  of  lossless  tableau 
mappings  (Theorem  7),  we  used  the  compactness  theorem.  But  in  order  to  use  compactness,  we  have  to  take 
into  account  both  finite  and  infinite  databases.  What  happens  if  we  restrict  ourselves  to  finite  databases? 
Conceivably,  there  could  be  a  first-order  expression  that  is  equivalent  to  an  infinite  union  of  lossless  tableau 
mappings,  but  is  not  equivalent  to  any  finite  union  of  such.  In  some  limited  cases,  such  as  Example  6,  we 
can  show  that  this  is  not  the  case  by  more  involved  arguments.  But  these  arguments  do  not  lead  themselves 
to  generalization. 

Third,  there  are  reasons  why  the  representative  instance  approach  does  not  support  all  the  semantics 
that  we  might  wish  for  in  a  universal  relation  system,  and  we  doubt  that  it  will  serve  as  the  “ultimate” 
universal  relation  model.  Some  problems  that  we  see  as  forcing  awkwardness  in  the  way  universal  relation 
systems  are  used  are  the  following. 

1.  A  representative  instance  system  answers  queries  by  intersecting  the  weak  instances,  and  then  applying 
the  query  (Theorem  1).  However,  if  the  weak  instances  arc  all  the  possible  universal  relations  that  the 
user  might  see,  it  may  make  more  sense  to  apply  the  query  to  all  the  weak  instances,  and  then  take  the 
intersection  of  the  results.  This  approach  cannot  produce  more  than  the  method  of  query  interpretation 
in  which  we  compute  the  representative  instance  first,  but  there  are  examples  where  it  produces  less, 
notably  when  a  join  on  “not  equal”  is  involved  in  the  query. 

2,  The  representative  instance  allows  us  to  infer  equalities  among  nulls,  but,  since  nulls  arc  projected  out 


before  we  compute  the  answer  to  the  query,  these  equalities  cannot  influence  the  answer.  For  example, 
we  might  deduce  that  employees  Smith  and  Jones  have  the  same  manager  because  they  are  in  the  same 
department,  yet  not  know  their  manager,  because  the  information  is  not  in  the  database.  In  answer  to 
the  query  “list  pairs  of  employees  with  the  same  manager,”  we  would  not  list  the  pair  (Smith,  Jones). 
3.  The  representative  instance  approach  supports  only  qne  notion  of  nulls,  generally  referred  to  as  “missing 
value  nulls.”  We  might  wish  to  restrict  the  ability  assumed  in  the  representative  instance  approach 
to  extend  any  tuple  in  any  relation  of  the  database  to  the  universal  set  of  attributes.  Perhaps  it  b 
better  to  extend  tuples  only  in  limited  ways,  leaving  certain  positions  in  tuples  “blank”  and  allowing 
no  dependency  applications  at  all  involving  blanks.  The  effect  of  thb  restriction  b  that  the  universal 
relation  b  split  into  several  relations  with  overlapping,  but  distinct  sets  of  attributes. 

We  hope  to  discuss  these  bsues  and  propose  an  “improved”  representative  instance  that  supports  many 
of  the  concepts  developed  here  in  a  forthcoming  paper  <MUV>. 
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